One Billion Transistors, One Uniprocessor, One Chip

نویسندگان

  • Yale N. Patt
  • Sanjay J. Patel
  • Marius Evers
  • Daniel H. Friendly
  • Jared Stark
چکیده

A billion transistors on a single chip presents opportunities for new levels of computing capability. Depending on your design point, several distinct implementations are possible. In our view, if the design point is performance , the implementation of choice is a very high-performance uniprocessor on each chip, with chips interconnected to form a shared memory mul-tiprocessor. The basic design problem is partitioning: How should the functionality be partitioned to deliver the highest performance computing system? Because one billion transistors falls far short of an infinite number, the highest performance computing system will not fit onto a single chip. We must decide what functionality cannot tolerate the latency of interchip communication. To do otherwise precludes obtaining the highest performance. Intraprocessor communication, to be effective, must keep latency to a minimum, locating on the same chip as many as possible of the structures necessary to support a high-performance uniprocessor. These structures include those necessary both for aggressive speculation, such as a very aggressive dynamic branch predictor, and for very-wide-issue superscalar processing , such as • a large trace cache, • a large number of reservation stations, • a large number of pipelined functional units, • sufficient on-chip data cache, and • sufficient resolution and forwarding logic. A reasonable on-chip specification would issue a maximum of 16 or 32 instructions per cycle (issue width), include reservation stations to accommodate 2,000 instructions, and include 24 to 48 highly optimized , pipelined functional units. We believe that the effectiveness of these structures continues to scale as the number of transistors on a chip increases. In our view, we will run out of transistors before we run out of functionality in support of a single instruction stream. Ergo, one billion transistors, one uniproces-sor, one chip. History argues that such an engine could never run at peak performance—that diminishing returns are inevitable. We disagree: Ingrained in the model is the flexibility of dynamic scheduling, coupled with the structures required to exploit it. True, this will require better algorithms to solve the application problems, better compiler optimizations, and better CAD tools to manage the great increase in design complexity. But one billion transistors on a chip is still a decade out, and the industry has talented people working on all these fronts. Even if diminishing returns does prove to be the case, we suggest that it is still better to combine higher performance uniprocessor chips—where higher latency interchip …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast Transient Simulation Strategy by Positively Utilizing On-Chip Inductance

With the progress of semiconductor manufacturing technology, the number of transistors integrated in one chip reaches more than 1 billion. The speeding-up of the operating frequency and the scaling of the fabrication process expose many problems, such as the interconnect delay due to the parasitic capacitance, inductance and resistance, and the manufacturing variation. Therefore, physical desig...

متن کامل

Atlas: a Dynamically Parallelizing Chip-multiprocessor for Gigascale Integration

Single-chip multiprocessors are an important research direction for future microprocessors. The stigma of this approach is that many important applications cannot be automatically parallelized. This chapter presents a single-chip multiprocessor that engages aggressive speculation techniques to enable dynamic parallelization of irregular, sequential binaries. Thread speculation (multiscalar exec...

متن کامل

Simultaneous multithreading: a platform for next-generation processors

With the dizzying pace of semiconductor technology development, CPU designers are squeezing previously unimaginable amounts of hardware onto a single chip. Over the next 15 years we can expect the number of transistors on a chip to increase by two orders of magnitude, to a billion transistors. The obvious question, then, is how to use these transistors. One possibility is to add more memory (ei...

متن کامل

On-Chip Networks for Multicore Systems

With Moore’s law supplying billions of transistors, and uniprocessor architectures delivering diminishing performance, multicore chips are emerging as the prevailing architecture in both general-purpose and application-specific markets. As the core count increases, the need for a scalable on-chip communication fabric that can deliver high bandwidth is gaining in importance, leading to recent mu...

متن کامل

Performance , Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors 1

paper evaluates asymmetric cluster chip multiprocessor (ACCMP) architectures as a mechanism to achieve the highest performance for a given power budget. ACCMPs execute serial phases of multithreaded programs on large high-performance cores whereas parallel phases are executed on a mix of large and many small simple cores. Theoretical analysis reveals a performance upper bound for symmetric mult...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Computer

دوره 30  شماره 

صفحات  -

تاریخ انتشار 1997